Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering
نویسندگان
چکیده
منابع مشابه
Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction
In classification or prediction tasks, data imbalance problem is frequently observed when most of instances belong to one majority class. Data imbalance problem has received considerable attention in machine learning community because it is one of the main causes that degrade the performance of classifiers or predictors. In this paper, we propose geometric mean based boosting algorithm (GMBoost...
متن کاملSampling Imbalance Dataset for Software Defect Prediction Using Hybrid Neuro-fuzzy Systems with Naive Bayes Classifier
Original scientific paper Software defect prediction (SDP) is a process with difficult tasks in the case of software projects. The SDP process is useful for the identification and location of defects from the modules. This task will tend to become more costly with the addition of complex testing and evaluation mechanisms, when the software project modules size increases. Further measurement of ...
متن کاملSoftware defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem
Software development projects inevitably accumulate defects throughout the development process. Due to the high cost that defects can incur, careful consideration is crucial when predicting which sections of code are likely to contain defects. Classification algorithms used in machine learning can be used to create classifiers which can be used sensitive classification methods attempt to make p...
متن کاملUsing Class Imbalance Learning for Cross-Company Defect Prediction
Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, the performance of such CCDP models is susceptible to the high imbalanced nature between the defect-prone and non-defect classes of CC data. Class imbalance learning is applied to alleviat...
متن کاملSupport Vector Machines for Class Imbalance Rail Data Classification with Bootstrapping-Based Over-Sampling and Under-Sampling
Support Vector Machines (SVMs) is a popular machine learning technique, which has proven to be very effective in solving many classical problems with balanced data sets in various application areas. However, this technique is also said to perform poorly when it is applied to the problem of learning from heavily imbalanced data sets where the majority classes significantly outnumber the minority...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2019
ISSN: 2169-3536
DOI: 10.1109/access.2019.2945858